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<N ' Abstract 



Query equivalence is investigated for disjunctive aggregate queries with negated subgoals, 
constants and comparisons. A full characterization of equivalence is given for the aggregation 
functions count, max, sum, prod, top2 and parity. A related problem is that of determining, 
for a given natural number N, whether two given queries are equivalent over all databases with 
at most N constants. We call this problem bounded equivalence. A complete characterization 
of decidability of bounded equivalence is given. In particular, it is shown that this problem is 
decidable for all the above aggregation functions as well as for cntd (count distinct) and avg. 
For quasilinear queries (i.e., queries where predicates that occur positively are not repeated) 



r/3 , it is shown that equivalence can be decided in polynomial time for the aggregation functions 

count, max, sum, parity, prod, top2 and avg. A similar result holds for cntd provided that a 
few additional conditions hold. The results are couched in terms of abstract characteristics of 
CN| ' aggregation functions, and new proof techniques are used. Finally, the results above also imply 

that equivalence, under bag-set semantics, is decidable for non-aggregate queries with negation. 

(N 

§ ■ 1 Introduction 

The emergence of data warehouses and of decision-support systems has highlighted the importance 
of efficiently processing aggregate queries. In such systems the amount of data is generally large 
and aggregate queries are used as a standard means of reducing the volume of the data. Aggregate 
queries tend to be expensive as they "touch" many items while returning few. Thus, optimization 
techniques for aggregate queries are a necessity. Many optimization techniques, such as query 
rewriting, are based on checking query equivalence. For this purpose, a coherent understanding of 
the equivalence problem of aggregate queries is necessary. 

One of our main results in this paper is that equivalence is decidable for disjunctive queries 
with comparisons and negated subgoals if they contain one of the aggregation functions: max, 
top2, count, sum, prod, or parity. 

A query that does not have negated subgoals is positive. Equivalence of positive non-aggregate 
queries has been studied extensively [§, [l], [l5|, |j| |l7|, 12 1. Furthermore, in 0] it has been shown 



that equivalence is decidable for non-aggregate disjunctive queries with negation. Syntactic charac- 
terizations of equivalences among aggregate queries with the functions max, sum, and count have 
been given in [13, These results have been extended in § to queries with the functions prod and 



avg, for the special case of queries that contain neither constants nor comparisons. Thus, there are 
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results on the equivalence problem for non-aggregate queries with negation as well as for aggregate 
queries without negation. Equivalence of aggregate queries with negated subgoals was dealt with 
for the first time in This paper is a substantially revised and extended version of [[|].[] 

Our decidability proofs rely on abstract properties of aggregation functions. We consider func- 
tions that are defined by means of operations on abelian monoids. Our proofs work out if the 
monoids are either idempotent or are groups. Functions of the first kind are max and top2, func- 
tions of the second kind are count, sum, and parity. 

For these functions we reduce equivalence with respect to all possible databases to equivalence 
over databases that have at most as many constants as there are constants and variables in the 
queries, a property which we call local equivalence. We do not study local equivalence immediately, 
but rather the more general problem of bounded equivalence. It consists of determining, given 
a nonnegative integer N and two queries, whether the queries return identical results over all 
databases with at most N constants. We give a complete characterization of decidability of bounded 
equivalence. In particular, we show that bounded equivalence is decidable for queries with the 
functions count, cntd, max, sum, prod, avg, top2 and parity. 

Finally, we consider the special case of quasilinear queries, that is, queries where predicates that 
occur positively are not repeated. For quasilinear queries equivalence boils down to isomorphism, 
which can be decided in polynomial time. 

2 Aggregation Functions 

An aggregate query is executed in two steps. First, data is collected from a database as specified 
by the non-aggregate part of the query. Then the results are grouped into multisets (or bags), an 
aggregation function is applied to the multisets, and the aggregates are returned as answers. 

The queries that we consider in this paper contain the aggregation functions count and cntd, 
which for a bag return the number of elements or distinct elements, respectively; parity, which 
returns or 1, depending on whether the number of elements in the bag is even or odd; sum, prod 
and avg, which return the sum, product, or average of the elements of a bag; max, which returns 
the maximum among the elements of a bag; and top2, which returns a pair consisting of the two 
greatest different elements of a bag. 

The reader will notice in the course of the paper that our results for max and top2 immediately 
carry over to min and bot2, which select the minimum or the two least elements out of a multiset 
of numbers. Moreover, our results for top2 can easily be generalized to the function topK, which 
selects the K greatest different elements. 

Our arguments to prove decidability of equivalence for certain classes of aggregate queries rely 
on the fact that the aggregation functions take values in special kinds of abelian monoids and are 
defined in terms of the operations of those monoids. To make this formal, we will introduce the 
class of monoidal aggregation functions and two of its subclasses. We will show that all of the 
above functions except cntd, prod and avg belong to one of these two subclasses. In general, an 
aggregation function maps multisets of tuples of numbers to values in some structure, which in most 
cases consists again of numbers. Here, we assume that the results of the aggregation are elements 
of some abelian monoid. 

An abelian monoid is a structure (M, +, 0) consisting of a set M with an associative and 
commutative binary operation, which we denote as "+", and a neutral element, which we denote 
as 0. If no confusion can arise, we identify a monoid with the set on which it is defined and refer 

1 One of the proofs in Q was incorrect and has been corrected in this version. 
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to (M, +, 0) simply as the monoid M. 

An abelian monoid M is idempotent if a + a = a holds for all a G M, and M is a group if for 
every a G M there is a 6 G M such that a + b = 0. The element b is called the inverse of a and is 
usually denoted as —a. Instead of a + (—6) we will usually write a — b. 

Examples 2.1 Standard abelian monoids are the set of integers Z and the set of rational numbers 
Q, with the binary operation of addition and the neutral element 0. We use Q ± to denote the set of 
rational numbers without the element 0. Note that Q ± , with the binary operation of multiplication 
and the neutral element 1 is also an abelian monoid. A further example is the two-element group 
Z2 = { 0, 1 }, where the addition satisfies 1 + 1 = 0. 

By Qj_ we denote the rational numbers augmented by a new element _L, which is less than any 
element in Q. Then is an abelian monoid if the operation is selecting the maximum of two 
numbers. The neutral element is _L. 

A less common example is the monoid T2, which is defined on the set of pairs 

T 2 := {(d,e) G Q± x Q ± | d > e} U {(±,±)}. 

We denote the binary operation on T2 as "ffi". We define (<Zi,ei) © (^2,62) as the pair (d,e) that 
consists of the two greatest different elements among { d\ , e\ , c?2 , e2 } , provided this set has at least 
two elements, and as the pair (d, _L) if the set consists only of the element d. For instance, we have 
that (5, _L) © (2, 1) = (5, 2), that (5, 2) © (5, 1) = (5, 2), and that (5, _L) © (5, _L) = (5, _L). Clearly, 
(_L,_L) is the neutral element. □ 



If (M, +, 0) is an abelian monoid, we can extend the binary operation to subsets of M and to 
multisets over M in a canonical way — because of the associativity and commutativity of "+", the 
order in which we apply the operation does not matter. If S is such a set or multiset, we denote 
the result of applying "+" to S as YlaeS a - 

Many common aggregation functions are computed by first mapping the elements of a multiset 
of tuples into an abelian monoid and then combining the values obtained through the mapping by 
the monoid operation. 

Later on in the paper, we will assume that aggregate queries range over databases with constants 
from some set Z. We call such a set a domain and assume that a linear ordering "<" is defined on 
its elements. For example, the integers Z and the rational numbers Q are such domains. For our 
discussion of aggregation functions the ordering on the domains is of no importance. 

If Z is a domain and k is a nonnegative integer, we use Z k to denote the fc-fold cartesian product 
of Z. Thus, Z k consists of all fc-tuples where the components are elements of Z. For the special 
case in which k = 0, the set Z° consists of a single element, called the empty tuple. When k = 1, 
we will often omit the superscript. Hence, we use Z to denote I 1 . 

Technically, we assume that there is a domain Z, a nonnegative integer k, a monoid (M, +,0) 
and a function / : Z k — ► M. Then the aggregation function over Z k based on / and "+", which 
maps multisets B over Z k to elements of M, is denoted as and defined by 

a+(B) :=£/("). 

for all bags B over Z k . We say that a is a monoid aggregation function if a = for some abelian 
monoid operation "+" . In particular, we say that is idempotent or a group aggregation function 
if the underlying monoid is idempotent, or a group, respectively. 
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Examples 2.2 Obviously, sum and max are the unary aggregation functions over Z or Q, based on 
the identity mapping and on addition or the binary operation "max," respectively. The functions 
count and parity over Z° arise from the additive groups Z and Z2, respectively, by choosing as / 
the mapping that maps every element to the constant 1. Note that count and parity are miliary 
aggregation functions. Therefore, the domain over which they are defined consists only of the 
empty tuple. We obtain the aggregate top2 over Q by choosing the monoid T2 and the mapping 
/: Q — » T2 defined by f(a) := (a, _L). Similarly, we can define top2 over the integers. 

Note that sum, count and parity are group aggregation functions, while max and top2 are 
idempotent. The aggregation function prod is also a group aggregation function, when defined 
over Q 1 * 1 . However, one can prove that cntd, prod (over Q or Z) and avg are not monoid aggregation 
functions. □ 



3 Disjunctive Aggregate and Non- Aggregate Queries 

We now introduce conjunctive and disjunctive queries with negated subgoals and review their basic 
properties. We use standard Datalog syntax extended by aggregation functions. 

3.1 Syntax of Non-aggregate Queries 

Predicate symbols are denoted as p, q or r. A term, denoted as s or t, is either a variable or a 
constant. A relational atom has the form p(s\, . . . , s&), where p is a predicate of arity k. We also 
use the notation p(s), where s stands for a tuple of terms (s\, . . . , s^). Similarly x stands for a 
tuple of variables. An ordering atom or comparison has the form s\ p S2, where p is one of the 
ordering predicates <, <, >, >, or 7^. A relational atom can be negated. A relational atom that 
is not negated is positive. A literal is a positive relational atom, a negated relational atom, or a 
comparison. A condition, denoted clS ^4. IS cl conjunction of literals. A condition A is safe if 
every variable appearing in A either appears in a positive relational atom or is equated with such 
a variable. Throughout this paper we will assume that all conditions are safe. 
A query is a non-recursive expression of the form 

q{x) «- A x V ••• VA n , 

where each Ai is a condition containing all the variables appearing in the tuple x. The variables 
that occur in the head, i.e., in x, are the distinguished variables of the query. Those that occur 
only in the body are the nondistinguished variables. 

A query is conjunctive if it contains only one disjunct. A query is positive if it does not contain 
any negated relational atoms. By abuse of notation, we will often refer to a query by its head q(x) 
or simply by the predicate of its head q. 

3.2 Semantics of Non-aggregate Queries 

Databases are sets of ground relational atoms and are denoted by the letter T>. The carrier of T>, 
written carr(T>), is the set of constants occurring in T>. In this paper we assume that the constants 
in a database are either integers or rational numbers. We define how a query q, evaluated over a 
database P, gives rise to a set of tuples q v . 

An assignment 7 for a condition A is a mapping of the variables and constants appearing in A 
to constants, such that each constant is mapped to itself. Assignments are naturally extended to 
tuples, atoms and other complex syntactical objects. For s = (si, . . . , sj~) we let 7(5) denote the 
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tuple (7(51), . . . ,7(sfc)). The application of an assignment to other syntactical objects is defined 
analogously. Satisfaction of atoms and of conjunctions of atoms by an assignment with respect to a 
database or with respect to a semantic structure are defined in the obvious way. Sometimes, we will 
refer to assignments as instantiations and to the result of applying an assignment to a syntactical 
object as an instantiation of that object. 

For the interpretation of comparisons it makes a difference whether they range over a dense 
order, like the rational numbers, or a discrete order, like the integers. A conjunction of comparisons, 
like < y < z < 2, may be satisfiable over the rational numbers, but not over the integers. 

For a given database V, a query q(x) <— A\ V • • • V A n defines a new relation 

n 

q v := |7(x) 7 satisfies Ai with respect to £>j. (1) 

i=l 

Chaudhuri and Vardi [[| have introduced bag-set semantics, which records the multiplicity with 
which a tuple occurs as an answer to the query. The definition in (Q) can be turned into one 
for bag-set semantics by replacing set braces by multisets and set union by multiset union. Bag 
semantics differs from bag-set semantics in that both database relations and relations created 
by queries are multisets of tuples. 



3.3 Syntax of Aggregate Queries 

In Jl3| , [5| we have shown that equivalence of positive disjunctive queries with several aggregate 
terms can be reduced to equivalence of queries with a single aggregate term. Using a similar proof 
it is possible to show that this still holds if the queries can contain negated subgoals. For this 
reason, we consider in the present paper only queries having a single aggregate term in the head. 
We give a formal definition of the syntax of such queries. 

An aggregate term is an expression built up using variables and an aggregation function. For 
example count and sum(y) are aggregate terms. We use ot{y) as an abstract notation for an 
aggregate term. Note that y can be the empty tuple as in the case of the functions count or parity. 

An aggregate query is a query augmented by an aggregate term in its head. Thus it has the 
form 



q(x, a{y)) <- A x V • • • V A n . (2) 

In addition, we require that 

• no variable x E x occurs in y; 

• each condition Ai contains all the variables in x and in y. 

We call x the grouping variables of the query. If the aggregate term in the head of a query has the 
form a(y), we call the query an a-query (e.g., a max-query). 



3.4 Semantics of Aggregate Queries 

Consider an aggregate query q as in Equation (Q). We define how, for a database T>, the query 
yields a new relation q v '. We proceed in two steps. 

We denote the set of assignments 7 over T> that satisfy one of the disjuncts Ai in the body of q 
as T(q,T>). We assume that such a 7 is defined only for the variables that occur in Ai. Moreover, if 
an assignment 7 satisfies two or more disjuncts, we want it to be included as many times in T(q, D) 
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as there are disjuncts it satisfies. To achieve this, we assume that there are as many copies of 7 in 
T(q,T>) as there are disjuncts that 7 satisfies, and that each copy carries a label indicating which 
disjunct it satisfies.^ 

Recall that x are the grouping variables of q and y are the aggregation variables. For a tuple d, 
let Y^{q,V) be the subset of T(q,T>) consisting of labeled assignments 7 with j(x) = d. In the sets 
T^(q,T>), we group those satisfying assignments that agree on x. Therefore, we call Tj(q,V) the 
group of d. 

Let A be a set of labeled assignments and y be a tuple of variables for which the elements of A 
are defined. Then we define the restriction of A to y as the multiset 

\ :={{i(y) I 7^}. 

We can apply the restriction operator to T^(q,V). If ct{y) is an aggregate term, we can apply a to 
the multiset A\y, which results in the aggregate value a(A^). As an alternative notation, we define 

a(y) I A := a(A\ v ). 
Now we define the result of evaluating q{x,a{y)) over V, denoted q v ', by 

q V := {{d,a(y) J T d (q,V)) | d= 7 (x) for some 7 € r(g,D)}. 
Similarly as for non-aggregate queries, q v is a set of tuples. 
3.5 Equivalence 

Two queries q and q', aggregate or non-aggregate, are equivalent, denoted q = q', if over every 
database they return identical sets of results, that is, if q v = q' V for all databases V. For positive 
non-aggregate queries, equivalence is decidable and has been characterized in terms of the existence 
of query homomorphisms J||, |l^, Levy and Sagiv have shown that equivalence is still decidable 



for disjunctive queries with negated atoms [12 



In ||T^, ||], we have proved decidable characterizations for the equivalence of positive conjunctive 
and disjunctive aggregate queries with the operators max, count, and sum. Note that two non- 
aggregate queries q(x) and q'(x) are equivalent under bag-set semantics if and only if the count- 
queries q(x, count) and q'{x, count) are equivalent. Thus, characterizations of the equivalence of 
count-queries immediately yield criteria for non-aggregate queries to be equivalent under bag-set 
semantics. 



4 Bounded Equivalence 

Our goal is to reduce the problem of deciding equivalence of two aggregate queries over all possible 
databases to the problem of deciding local equivalence, that is, equivalence over databases containing 
no more constants than the size of the queries. In this section, we present the conditions necessary 
for the more general bounded equivalence problem to be decidable. 

Let TV be a nonnegative integer. We say that two queries q and q' are N -equivalent, denoted 
Q =N l', if for all databases T> whose carrier has at most N elements, we have q v = q' V . The 

2 We could make this more formal by defining V{q, T>) to consist of pairs (7, i), where 7 is an (ordinary) assignment 
and i the index of a condition that it satisfies. However, to avoid charging our notation with too much detail, we 
prefer to introduce the concept of "labeled assignments" informally. 
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bounded equivalence problem for a class of queries is to decide, given N > and queries q and q' 
from that class, whether q =n q' ■ 

Let A be a condition. The variable size of A is the number of variables in A. Let g be a 
disjunctive query. The variable size of q is the maximum of the variable sizes of the conditions in 
q. If a query contains an equality y = z, it does not matter for the proofs later on whether the 
variables y and z are counted once or twice. 

The term size of a query is the total number of constants occurring in that query plus the 
variable size. The term size of a pair of queries q and q' is the total number of constants occurring 
in at least one of q or q' plus the maximum of the variable sizes of q and q' . We denote the term 
size of q as r(q) and the term size of q and q' as r(q, q'). We say that two queries q and q' are locally 
equivalent if q v = r / ?j9 n q' v , that is, if q and q' return identical results over all databases with at 
most r(q,q') constants. 

Clearly, two queries are equivalent if and only if they are iV-equivalent for all N > 0. How- 
ever, the decidability of bounded equivalence for a class of queries does not necessarily imply that 
equivalence is decidable. Sections || and || establish criteria for this implication to hold. Moreover, 
decidability of iV-equi valence, for a fixed N, does not imply decidability of local equivalence, since 
in the latter problem the size of the databases to be tested depends on the size of the queries. 

Proposition 4.1 (Bounded and Local Equivalence) If the bounded equivalence problem is de- 
cidable for a class of queries, then local equivalence is decidable, too. 

Proof. Deciding local equivalence of q and q' boils down to deciding bounded equivalence of q 
and q' for N = r(q, q'). □ 

In the rest of this section we study the decidability of the bounded equivalence problem for 
several aggregation functions. Note that ^-equivalence is not necessarily a trivial property. Even 
if the size of databases is bounded, there are still infinitely many databases whose size is below the 
bound, and the aggregation results may well depend on the values of the constants in the given 
database. 

We introduce the notion of shiftable aggregation functions and of order-decidable aggregation 
functions. We show that shiftable aggregation functions are a special case of order-decidable aggre- 
gation functions. Finally, we prove that bounded equivalence is decidable exactly for queries with 
order-decidable aggregation functions. 

4.1 Shiftable Aggregation Functions 

We introduce the notion of shiftable aggregation functions. Intuitively, the value of such a function 
does not depend on the specific values in a multiset, but only on the ordering of the elements. 

Let D and D' be subsets of a domain X and (f>: D — ► D' be a function. We say that 4> is a 
shifting function over I if for all d, d' 6 D we have 

d < d' => (p(d) < ip(d'). 

In other words, a shifting function over a domain is a strictly monotonic function from one subset 
of the domain to another subset. A shifting function is applied to bags as one would expect. Let a 
be an aggregation function that is defined over I k . We say that a is shiftable if for all subsets D 
and D' of T, for all shifting functions ip: D —> D', and for all bags B and B' with elements in D k , 
we have 

a(B) = a(B') a(cp(B)) = a(<p(B')). 
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Proposition 4.2 (Shiftable Aggregation Functions) The aggregation functions parity, cntd, 
count, max and top2 are shiftable. 



Proof. The results of the aggregation functions parity and count depend only on the number 
of elements in the bag to which they are applied. Applying a shifting function to a bag does not 
affect this number. Therefore, these functions are shiftable. Similarly, the result of the aggregation 
function cntd depends only on the number of distinct elements in the bag to which it is applied. 
Since shifting functions are always injective, cntd is also shiftable. 

The aggregation function max chooses the greatest element in a bag. The order of the elements 
is preserved by a shifting function. Thus, max(ip(B)) = (p(max(B)). By definition, ip is an injection. 
Therefore, max(B) = max(B') if and only if (p(max(B)) = ip(max(B')), which is true if and only if 
max((p(B)) = max((p(B')) . Hence, max is shiftable. 

Using similar reasoning to that of max it is easy to see that £op2 is shiftable. □ 

Note, however, that the aggregation functions sum and prod are not shiftable. For example, 
consider the bags B = -{[ 2, 2 § and B' = •{[ 4])- and suppose <p is a shifting function with y(2) = 3 
and (^(4) = 5. Then sum(B) = sum(B') = prod(B) = prod(B') = 4, while neither sum nor prod 
agree on <p(B) = {{ 3, 3 } and ip(B') = {{ 5 }. 

4.2 Order-Decidable Aggregation Functions 

Before defining order-decidable aggregation functions, we present some auxiliary definitions. Given 
a domain T, a conjunction of ordering atoms L, and an ordering atom t p t! , we define in the 
standard way when L entails t pt' with respect to X, denoted L \=j t p t' , and when L is satisfiable 
with respect to I. 

We say that L is a complete ordering of a set of terms T with respect to I if for every two terms 
t, t' £ T, exactly one of the following holds: 

• L |=j t < t'; 

• L\=jt> t'; 

• L\= T t = t'. 

Note that by definition, complete orderings are satisfiable. 

Let a be an aggregation function over Z k . An ordered identity for a is a formula 



where L is a complete ordering of some set of terms T with respect to I, and B and B' are bags 
containing /c-tuples of terms from T. We say that a is order-decidable over 2 if the validity of 
ordered identities for a is decidable over I. Note that the validity of an ordered identity may be 
dependent on I. 

Formula (^) is valid if for every assignment 5 that maps the variables in L to X and satisfies L, 
we have that a yields the same values when applied to 5(B) and to d~(B'). 

Example 4.3 It is easy to see that the function cntd is order-decidable over any domain. Consider, 
for example, the bags B = -J 1, 2, u J and B' = -{[ v, v, 7, 8 J and an arbitrary complete ordering L 
of { 1, 2, u, v, 7, 8}. It is straightforward to decide whether the formula 



L a(B) = a(B') 



(3) 



L cntd({ 1,2,-u}) 



cntd({{^,7,8}}) 
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is valid, since for any assignment 5 satisfying L the number of distinct values in the bags 5(B) 
and 5(B') is not dependent on the values assigned to u and v. In fact, the number of distinct 
elements that are contained in 5(B) and 5(B') depends entirely on the ordering L. □ 

It is not by chance that the function cntd is order-decidable over all domains. It is actually a 
consequence of the fact that cntd is a shiftable aggregation function. 

Theorem 4.4 (Shiftable Implies Order-Decidable) Let a be a shiftable aggregation function 
defined overZ k . Then a is order-decidable over I. 

Proof. Let L — * a(B) = a(B') be an ordered identity as in (^). In principle, to check this 
identity for validity, one has to verify that for all 5 satisfying L the equality a(5(B)) = a(5(B')) 
holds. We will show that it is sufficient to verify the equality for a single 5 if a is shiftable. 

Suppose that a is a shiftable aggregation function over I k . Let 5: T — > I be an assignment. 
Clearly, if 5 satisfies L, then the following conditions hold: 

• 5 maps all constants to themselves; 

• for all t, if G T and all ordering predicates p we have that 

tpt'eL => |=/ 5(t) p 5(t'). 

Consider Formula (0). Since L is a complete ordering, L is satisfiable with respect to I. Let 
5 be an assignment satisfying L. Now, let 5' : T — > I be a second assignment that satisfies L. 
We assume without loss of generality that there are no two different terms ti, t<i £ T for which 
L \=i t\ = t2- (If there were such terms we could remove one of them by renaming.) Hence, 5 and 
5' are injections. Thus, the function 5' o J -1 is well defined. 

Since both 5 and 5' preserve order, 5' o S^ 1 is a shifting function. Thus, a(5(B)) = a(5(B')) 
implies a(5'(B)) = a(5'(B')), as required. □ 



The other direction of Theorem |4.4j does not hold. An aggregation function can be order- 
decidable over a given domain even if it is not shiftable. For example, the aggregation functions 
sum and avg are order-decidable, although they are not shiftable. 

Proposition 4.5 (Order Decidability of Sum and Average) The aggregation functions sum 
and avg are order-decidable over Z and over Q. 

Proof. For the aggregation function sum, Formula (|3|) can be expressed using Presburger 
arithmetic. Recall that Presburger arithmetic is the first-order theory of addition. Presburger 



showed [14] that Presburger integer arithmetic (i.e., where the variables range over the integers) is 
decidable. Similarly, Presburger rational arithmetic is also known to be decidable [10]. Therefore, 
sum is order-decidable. 

The order-decidability of avg follows in a straightforward fashion from the order-decidability of 
sum, as we now show. Let B be a bag of size N. We use N <g> B to denote the bag derived from B 
by increasing the multiplicity of each term in B by a factor of N. Thus, N ® B contains exactly 
the same terms as those in B. If a term t appears in B exactly k times, then t appears in N ® B 
exactly Nk times. 

Consider bags of numbers B and B'. Suppose that B is of size iV and B' is of size N'. Observe 
that 

avg(B) = avg(B') <J=^> N' sum(B) = N sum (B') sum(N' (g>B) = sum(N ® B') 
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Therefore, 

L -► avg(B) = avg(B') 

is valid if and only if 

L -> sum(Ar' (8> 5) = sum(N ® B') 

is valid, where and N' are the cardinalities of B and .£?', respectively. Hence, since sum is 
order-decidable, avg is also order-decidable. □ 

The aggregation function prod is also order-decidable. In order to show this result we first 
present a few necessary definitions and lemmas. These are needed when considering prod over the 
integers. 

Let T be a set of terms and let L be a complete ordering of T. We say that T is reduced with 
respect to L and to a domain X if 

• there are no different variables x and y occurring in T such that L \=x x = y; 

• there is no variable x occurring in T and no constant d in X such that L \=x x = d. 

We say that a constant c is a possible value for a variable x £ T with respect to L and X if there 
is an assignment for the variables in T with constants from X that satisfies L and maps x to the 
value c. Observe that if T is reduced with respect to L and X, then there are at least two different 
possible values for each variable in T. Also, note that T may be reduced with respect to L over the 
rational numbers, but not over the integers. For instance, T = { 0, x, 2 } is reduced with respect 
to L = {0 < x, x < 2} over the rational numbers, but over the integers, L entails that x = 1. 

Lemma 4.6 (Assignments for Possible Values) Let L be a complete ordering of the terms in 
T. Suppose that T is reduced with respect to L and to X. Let x be a variable in T and let c\ and c 2 
be possible values for x with respect to L and X. Then there are assignments Si and S 2 for the terms 
in T that satisfy L, are equal on all terms other than x, and such that 8\{x) = c\ and ^(x) = ci- 

Proof. Since c\ and C2 are possible values for x, there are assignments Si and S2 that satisfy L 
such that Si(x) = Cj, for i = 1, 2. Note that Si and S2 may also differ on additional terms in T. 
Let S be the assignment for the terms in T defined by 



S(t) :- 



min{ Si (t) ,S 2 (t)} if L |= j t < x 
max{ Si(t), (^(i) } if L \=j t > x. 



We show that S satisfies L. Let t and t' be terms in T. Suppose that L \=j t < t' . We consider two 
cases. 

Case 1. Suppose that L \=j t' < x. Let i be such that S(t') = Si(t'). Then, S(t) < Si(t) < 
Si(t') = S{t'). Therefore, S satisfies t < t'. 

Case 2. Suppose that L \=j x < t'. Let i be such that 6(t) = Si{t). Then, S{t) = Si{t) < Si(t') < 
S(t'). Therefore, S satisfies t < t'. 

Since L is a complete ordering and t < t' was arbitrary, it follows that S satisfies L. In a similar 
fashion we define the assignment S' as 



S'{t) :- 



min{ Si(t), S 2 (t) } if L \=j t < x 
max{ Si (t) ,S 2 (t)} if L |= j t > x 



We can show, as above, that 5' satisfies L. Clearly S and S' are equal on all terms other than x. 
One of the assignments S or S' maps x to ci and one maps x to c 2 . Therefore, we have found 
assignments as required. □ 
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Proposition 4.7 (Order Decidability of Product) The aggregation function prod is order- 
decidable over Z and over Q. 

Proof. Let T = { t\, . . . , t n } be a set of terms with constants from Z or Q, and let L be a 
complete ordering of T. Let B and i?' be bags of terms from T. We will show that it is possible to 
decide whether 

L -► prod{B) = prod(B') (4) 

is valid over Z or Q, respectively. Clearly, Formula (||) is valid if prod(5(B)) = prod(5(B')) for all 
assignments 5 that satisfy L. 

There may be assignments that satisfy L and map variables to the constant 0. It is important to 
be able to recognize these assignments. Let T' = TU { }. Let V be a complete ordering of T' that 
is a conservative extension of L. Formally, this means that for all terms t, t' £ T, the orderings L 
and V imply the same relationship between t and t' . Note that if £ T, then L' must be equivalent 
to L. There are only finitely many conservative extensions L' of L, and an assignment satisfies L if 
and only if it satisfies one of the extensions V . Thus, to prove our claim, we can assume without 
loss of generality that L in Formula (||) is a complete ordering of a set of terms that contains the 
constant 0. 

Furthermore, we can assume without loss of generality, that in Formula (|j) the set of terms T 
is reduced with respect to L. Otherwise, whenever T contains a variable y and a term t such that 
y and t are distinct, but L |=j y = t, then we replace y with t for every occurrence of y in L, B 
and B'. Eventually, we end up with a set of terms T, a complete ordering L of T, and bags B, B' 
of terms from T such that T is reduced with respect to L, and L — > prod(B) = prod(B') is valid if 
and only if Formula (Q) is valid. 

Next, we rewrite the equation ll prod(B) = prod(B')" . We note that for every assignment, 
prod(B) yields the same value as the polynomial cu^ 1 ■ ■ ■ u™ k , where 

• c is the product of all the constants in B; 

• u\, . . . ,Uk are all the variables in T; 

• rrii is the multiplicity of Ui in B. 

Similarly, prod(B') yields the same value as some polynomial d u™ 1 • • • u^ k . Now, deciding the 
validity of Formula (|j) amounts to deciding whether the equation 

c • • • (8(u k )) mk = d (^uOn • • • (6(u k )) mk (5) 

holds for all assignments 5 satisfying L. 

If c = d = 0, then clearly Equation (||) holds for any 6. Similarly, Equation (|5|) holds for any 5 
if c = d and m; = for all i. We show that if neither of the above conditions holds, then there is 
an assignment 5 that satisfies L and for which Equation (||) is not true. We consider two cases. 

Case 1. Suppose that mi = rij for all i, however c ^ d. Since L is a complete ordering, it is 
satisfiable. Let 6 be an assignment that satisfies L. Since the set of terms T is reduced with respect 
to L, and is an element of T, the ordering L imposes a strict inequality between and each 
variable. Therefore, 5 cannot map any variable to the constant 0. Hence, 5 is a counterexample to 
the correctness of Equation (|5|) . 

Case 2. Suppose that one of c or d is non-zero and that there is an index i such that mj ^ rij. 
Again, since T is reduced with respect to L, there are at least two possible values for u^, say c\ 
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and C2- Let 8\ and 82 be assignments that agree on all terms other than m and that satisfy 
Sj(ui) = Cj for j = 1, 2. Such assignments exist according to Lemma 4.6 , 



As before, 8\ and ^ cannot map any variable to the constant 0. If c = and d 7^ 0, then both 8\ 
and 82 are counterexamples to the correctness of Equation (||) . Similarly, they are counterexamples 
if c 7^ and d = 0. Therefore, assume that c 7^ and d 7^ 0. 

Suppose, by way of contradiction, that Equation (||) holds for both <5i and 62- Then, 



c (* i (ui)) mi • • • (^(u fc )P = d (<5 i (« 1 )) mi • • • (6 3 (u k )) mk 

for j = 1,2. Since the assignment 82 does not map any variable to 0, we can divide the equation 
with j = 1 by the equation with j = 2. Note that <5i and 82 are equal on all terms other than m. 
Therefore, after simplifying, we derive 



Si(ui)\ mi _ f Si(u 



82(ui)J \82{Ui) 



(6) 



Note that Si[uj) / 82^) 7^ 0, since 8% does not map any variable to the constant 0. In addition, 
81 (ui)/82 {ui) 7^ 1, since 81 and 82 differ on -Uj. 

Finally, 8\{ui) / 82{ui) 7^ —1, since 8\ and 82 must both map Ui to positive numbers or both map 
Ui to negative numbers. Therefore, Equation (||) cannot hold and either <5i or 82 is a counterexample 
to the correctness of Equation (||). This completes Case 2. 

Thus, we have shown how to decide the validity of Formula (|j) over both, the integers and the 
rational numbers. This completes the proof. □ 



4.3 Decidability of Bounded Equivalence 

It is possible to show that bounded equivalence can be decided for a-queries containing comparisons 
that range over I if the aggregation function a is order-decidable over I. Actually, bounded 
equivalence for a-queries ranging over X is decidable if and only if a is order-decidable over 2. 
This gives a complete characterization of decidability of bounded equivalence of aggregate queries 
with negation, disjunction, constants and comparisons. In addition, we derive as a direct result that 
bounded equivalence is decidable for queries with a wide range of common aggregation functions. 

Theorem 4.8 (Bounded Equivalence and Order-Decidability) Let a be an aggregation func- 
tion over Z k . Then the bounded equivalence problem is decidable for disjunctive a-queries with 
comparisons ranging over X if and only if a is order-decidable over I. 

Proof. "<^=" Suppose that a is order-decidable over I. Consider a-queries q and q' . We show 
how to check, given some > 0, whether q =n q'- 

Let C be the set of constants appearing in q or q' and let U be a set of N variables. We use T 
to denote C U U. Let P be the set of predicates appearing either in q or in q'. The set P contains 
predicates that appear either positively or negatively in the queries. We use ary(p) to denote the 
arity of a predicate p £ P. We denote by BASE the set of all atoms that can be created using the 
terms in T and the predicates in P. Formally, 

BASE := { p(tx,.. .,t ary{p) ) I p G P and t u . . . ,t ary(p) G T}. 

If 8 is an assignment that maps variables in U to elements of X, and if S is a subset of BASE, 
then instantiating S by 8 results in a database 8(S), the carrier of which has at most N elements. 
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To decide whether q =n q' it is sufficient to evaluate the queries over databases of the form 5(S) 
where S C BASE. Essentially, if we consider only databases of this form, we rule out databases 
containing predicates not appearing in q or q'. Clearly, such predicates cannot affect the evaluation 
of q and q' . 

Consider now a fixed subset S C BASE. We will show how to check whether q and q' return the 
same results over all instantiations S(S). Since there exist infinitely many instantiations, we cannot 
check each of them separately. Instead, we divide the instantiations into finitely many equivalence 
classes over which we can decide the equivalence of q and q' . The equivalence classes are defined by 
the complete orderings L of T, that is, for each L we simultaneously check all instantiations 5{S) 
where 5 satisfies L. 

In addition to S, consider a complete ordering L of T. Instead of an instantiation 5(S), we 
attempt to evaluate q and q' immediately over S, based on the ordering of terms defined by L. We 
view the set S equipped with this ordering as a database, denoted Sl- Obviously, given S and L, 
we can compute the bags containing the tuples returned by q and q'. However, it is impossible to 
compute the values of the aggregation function a for these bags because T, and therefore the bags, 
may contain variables. At this point, we make use of the fact that a is order-decidable over 1. 

Suppose that the tuple of grouping variables x in q(x, a(y)) and q(x, a(y')) has length k. Let i 
be a fc-tuple of terms in T. Note that there are only finitely many such tuples because T is finite. 
Recall that rj(g, Sl) is the set of assignments 7 that satisfy q over Sl and where 7(3;) = i. Consider 
the bag Bj defined as 

B t ■= rf(g, S L )\y, 

that is, Bf consists of the restrictions of elements of rj (q, Sl) to the variables in y. Let B'^ be 
defined analogously for q' . 

Now, assume that there is an assignment 8 that satisfies L such that over the database S(S) 
the queries q and q' do not return the same aggregate value for 5(t). This is the case if and only if 
the formula 

L a{Bt) = a(B^) 

is not valid over 1. Since a is order-decidable over J, the validity of the formula can be determined. 

"=>" We show that if bounded equivalence of a-queries ranging over 1 is decidable, then a is 
order-decidable over X. To simplify our notation, we assume without loss of generality that a is a 
unary function. Let T = { t±, . . . , i/v } be a set of terms, L be a complete ordering of T, and B and 
B' be bags of terms from T. We will construct a-queries q and q' such that 

L -► a(B) = a(B') (7) 

is valid over 1 if and only if q =jv q'. 

We assume without loss of generality that L does not equate two different terms. Otherwise, 
we could remove one of them by renaming. We define the condition A as 

A:=p(ti)Ap(t 2 )A...Ap(tjv). 

Suppose that B = -{[ si, . . . , s m ]f and B' = -{[ s[, . . . , s' n J. We assume that y is a new variable, i.e., 
y does not appear in B or B' . We define the conditions 

Ai := A A L A y = Si, i = 1, . . . , m 
A'j := AAL Ay = s'j, j = l,...,n. 
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and we define the queries q and q' by 



q(a{y))^\jA t and q'(a(y)) <— \J A'j. 

i=l j=l 

Suppose that q =n q' . We prove this implies that Formula (0) is valid. Let S be an arbitrary 
assignment that satisfies L. We show that a(5(B)) = a(5(B')). Consider the database T> obtained 
by instantiating A with 5, i.e., 

X>:={p(<5(ti)),..., 

Clearly, q and q' retrieve a(5(B)) and a(5(B')), respectively, over T>. The database T> contains at 
most N constants. Thus, since q =n q', the two aggregates are equal. 

Now suppose that Formula (|7|) is valid. We prove this implies that q =n q' ■ Let V be an 
arbitrary database containing at most N constants. If T> contains less than N values, then no 
assignment over T> can satisfy L, because L is a complete ordering that does not equate variables. 
Hence, q and q' will not return any value over T>. 

Therefore, assume that T> contains exactly N values. It is easy to see that q is satisfiable over 
T> if and only if q' is satisfiable over T>. In such a case, for each condition Ai or A'j there is exactly 
one satisfying assignment over D, say, 7, or 7^, respectively. Moreover, the assignments 7» and 
j'j agree on all variables except for y. That is, there is an assignment 5 for the variables in T 
such that 7j and 7'- agree with S on T, for all i and j. Due to the definition of Ai and A'j, we 
also have 7i(y) = 5(si) and jUy) = S(s' i ). Thus, q collects over V the bag -f 7i(y), • • • , 7m(y) 1 = 
§ <5(si), . . . , 6(s m ) ]} = ^(-B) and returns the aggregate a(<5(5)). Similarly, q' returns the aggregate 
a(5(B')). From the assumption that Formula (|?]) is valid, and the fact that 5 satisfies L, it follows 
that a(5(B)) = a(5(B')). Hence, q and q' return the same values over T>. Since T> was chosen 
arbitrarily, this proves that q =n Q 1 '■ d 



From the proof of Theorem [4.8| we can derive an upper bound on the complexity of determining 
whether q =n q' ■ In fact, the proof describes a procedure for checking A^-equi valence. Suppose 
that there are C constants in q and q'. Let T := C + N . As the first step of the procedure, the set 
BASE is created. This set contains all possible instantiations of the atoms in q and q' with the C 
constants and with N variables. Clearly, the cardinality of this set is exponential in T = C + N . 
Then, each subset S of BASE is considered. The number of such subsets is, thus, double exponential 
in T. A subset S is considered in conjunction with a complete ordering L of the T terms. Note 
that there are at most 2 T ~ 1 Tl complete orderings of T terms. (This is a rough upper bound, since 
we can arrange the T terms in Tl orders and then place a "<" or "=" sign between each pair.) 
Thus, considering all complete orderings does not affect the already double exponential order of 
complexity. 

For each pair S, L we evaluate q and q' . Evaluating q roughly takes time T^, where \q\ is the 
size of q, since we must try all instantiations of the terms in q. This too does not affect the order 
of complexity, since the computation time is only exponential, while there are a double exponential 
number of subsets S. For each tuple i that instantiates the grouping variables and thus defines 
a group, we check the validity of the ordered identity defined in Formula (||) for L and the bags 
created. 

Each bag can have at most T' 9 ' many elements. However, the number of different elements is 
bounded by T fc , where k is the arity of the aggregation function a. Therefore, to represent a bag 
we only need space polynomial in T. Thus, the size of the ordered identities to be checked for 
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validity is polynomial in T. As long as this check takes no more than double exponential time, the 
overall complexity is at most double exponential. In many cases, this step is much more efficient. 
For example, given the aggregation function count, this step only requires checking the cardinality 
of the bags, and hence, is linear. 

To summarize, if the validity of ordered identities of the form (|3|) can be checked in double 
exponential time, we derive a double exponential upper bound for the complexity of checking N- 
equivalence. 



The following corollary follows directly from Theorems |4.4j and [18 



Corollary 4.9 (Bounded Equivalence and Shiftable Functions) Let a be a shiftable aggre- 
gation function over Z k . Then for disjunctive a-queries with comparisons ranging over Z the 
bounded equivalence problem is decidable. 

Corollary 4.10 (Decidable Query Classes) For the classes of disjunctive max, sum, prod, 
avg, cntd, count, parity and topi queries, bounded equivalence is decidable provided that the com- 
parisons range over Q or Z. 

Proof. For the classes of disjunctive max, cntd, count, parity and tqp2, the claim follows directly 
from Corollary |4.9| . For disjunctive sum and avg queries, decidability follows from Proposition 
and Theorem \i.8[ Similarly, for disjunctive prod queries, the claim follows from Proposition 
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and Theorem 14.81. □ 



In Theorem 4.8 we reduced order-decidability to bounded equivalence. A point of interest is 
that in our reduction we only used positive queries. Therefore, negation in queries q and q' does 
not affect decidability of bounded equivalence of q and q' . 

Corollary 4.11 (Bounded Equivalence of Queries Without Negation) The bounded equiv- 
alence problem is decidable for positive disjunctive a-queries with comparisons ranging over Z if 
and only if the bounded equivalence problem is decidable for disjunctive a-queries with negation 
and comparisons ranging over Z. 



5 Decomposition Principles 

Levy and Sagiv |ll]] have shown that two disjunctive non- aggregate queries are equivalent if they 
are equivalent over all databases whose carrier is not greater than the size of the queries. For 
non-aggregate queries this is not surprising since an answer by a query q depends only on a single 
assignment satisfying q. Hence, if over some database T>, by means of the assignment 7, the query q 
returns the tuple d, but q' does not return d, then we can construct a database Vq C V that contains 
only constants occurring in q, q' and 7 such that d G q v ° and d £ q fV ° . 

For aggregate queries this argument cannot be applied since the results of a query are the 
amalgamation of many single results that may involve arbitrarily many constants in the database. 
Nevertheless, for queries with an idempotent monoid or a group aggregation function we can reduce 
equivalence over arbitrary databases to equivalence over small databases. 

As a first step, we formulate decomposition principles for these two classes of functions. Such 
a principle provides a method to compute the value of an aggregation over a union of sets of 
assignments from aggregations over the sets themselves and possibly some of their subsets. 

Note that the in the equation below is the extension of the idempotent monoid operation. 
In the case of max, for instance, the right hand side of the equation becomes max- =1 {max(y) { A{). 
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Proposition 5.1 (Idempotent Decomposition Principle) Let a be an idempotent monoid ag- 
gregation function and {Ai)^ =1 a family of sets of assignments, all defined for y. Then 

«(J7) I UtiA = I A). (8) 

Proof. Let (Ai)^ =1 be a family of sets of assignments. On the left hand side of Equation (|8|), 
first the union of the Ai is taken and a is applied afterwards. On the right hand side, a is applied 
first to (a projection of) each set Aj, and then the sum of the results is taken. Of course, it is 
possible for two different sets Ai and Aj to contain common elements. Note that on the left hand 
side, these duplicates are removed, whereas in the right side they are preserved. 

The aggregation function a is associative and commutative. Therefore, the order in which a is 
applied to the assignments is not important. Hence, the only difference between the two sides of 
Equation @ that might affect the result is that on the right side the same assignment may appear 
in the summation several times. 

However, because of the associativity and commutativity of the monoid operation, we may first 
sum such assignments with themselves. Since a is idempotent, the final result is the same as if 
each element had occurred only once. Therefore, Equation (||) holds. □ 

Before we treat the case of group aggregation functions, we remind the reader of the well-known 
Principle of Inclusion and Exclusion for computing the cardinality of a union of sets. It says that 
for any finite family of sets (Ai)^ =1 we have 

Uti M\ = Eli \M - Ei<j \Ai n aa + ■ - • +(-i) fc - 1 | nti H. (9) 

For group aggregation functions, we can generalize Equation (^). In the following decomposition 
principle, the "—"-sign denotes the inverse with respect to the group operation. Note that for 
a = count, Equation (|i~0| ) simplifies to Equation @, since (count [ A) = \A\ for every set of 
assignments A. 

Proposition 5.2 (Group Decomposition Principle) Suppose that a is a group aggregation 
function and (Ai)\ =l is a finite family of sets of assignments, all defined for y. Then 



I UiU Ai = Eti («(y) I Ai) - E l<j Hv) lA i nA j )+--- + 

(10) 

(-i) k - 1 Hy)if)t 1 A l ). 

Proof. Equation ( |io|) can be proved in the same fashion that the Principle of Inclusion and 
Exclusion is proved. Note that the right hand side of the equation is well defined since a takes 
values in an abelian group. 

Let 7 be an assignment. If 7 is not in Ai for any i, then clearly, 7 does not affect the value on 
the left or the right hand side of Equation (|i"o| ) . 

Suppose that 7 is in r different sets Aj. The assignment 7 contributes j(y) once to the union 
of sets to which a is applied on the left hand side on the Equation. On the right hand side of the 
Equation, since 7 is in r different sets Ai, the value j(y) is added and subtracted (possibly) several 
times. To prove equality of the two sides, it is sufficient to show that on the right hand side 7(2/) 
is added exactly one more time than it is subtracted. 
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Since 7 occurs in r sets A4, it occurs in intersections of two sets, (3) intersections of three 
sets, etc. Thus, on the right hand side, 7(2/) is added r times, then subtracted (2) times, then 
added (,) times, etc., In total, 7 contributes the tuple 7(3/) 

times to the right hand side of Equation (^), as required. □ 

Because of the above two propositions, we say that idempotent monoid and group aggregation 
functions are decomposable. 

6 Reducing Equivalence to Local Equivalence 

We now show that for queries with decomposable aggregation functions, local equivalence implies 
equivalence. To this end we first show that, given two queries and a database, we can identify small 
subsets of the database, such that the satisfying assignments over the database are the union of the 
satisfying assignments over the subsets. Then we apply the decomposition principles to conclude 
that the queries return the same result over the original database from the fact that the queries 
return the same results over the small databases. 

Let q\ and 52 be disjunctive queries, V be a database, and J be a tuple of constants. Let (Vi)^ =1 
be a family of databases with T>i C T> for all i = 1, . . . , k. Then (Pj)^ =1 is a decomposition of T> 
with respect to qi, qi and d if the following holds: 

1. \carr(T>i)\ < r(q\, q2) for all i = 1, . . . , k; 

2. T d ( qj ,V) = Uf =1 for j = 1, 2; 

3 - fl/i Fd(qj,T>i h ) = Tj(qj,f] h Vi h ) for j = 1, 2 and for all subfamilies (D ih ) h of (P*)*. 

The first condition means that, intuitively, the databases T>i are small. The second condition says 
that for each qj, j = 1, 2, we obtain exactly the satisfying assignments over T> that return d if we 
evaluate qj over each T>i separately and select the assignments that return d over T>i. The third 
condition says that for each qj , in order to obtain the intersection of the assignment sets Fj(qj , T>i h ) , 
it suffices to evaluate qj over the intersection of the databases T>i h . 

We will prove that given a pair of queries q and q', a database V and a tuple d, there exists a 
decomposition of T> with respect to q, q' and d. To this end, we will first prove a series of lemmas. 

We consider queries q and q' defined as 

q(x,a(y)) <- \J P % A N t A Q 

i<=I 

q'(x,a(y'))^ \J P^AN^AC* 

where Pi and Pj are conjunctions of positive relational atoms, iVj and A^j are conjunctions of negated 
relational atoms and Ci and Cj are conjunctions of comparisons. We use Ai as a shorthand for 
Pi A Ni A Ci and we use A' as a shorthand for Pj A iVj A Cj. 

Let P be a database and let d be a tuple. We must show that there exists a decomposition of 
V with respect to q, q' and d. 
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Procedure: ExTEND_DATABASE(2? 7 ,c/,g',2?) 




Input: Database 2Xy to be extended, 




queries q, q' , 




original database T> 




Output: Extension of 2? 7 with respect to q, q' and D 




1. 1 : = 




2. Vn : = XL 




3. Repeat 




4. Z:=/ + l 




5. T>i := Vi-i 




6. If there is an assignment 5 of q into that satisfies some Ag in g 


such that 


(3a) [-ia E Ng A 5(a) G £>] 




7. then 2?j := 2} U { 5(a) | ->a G iV^ A 5(a) G 2? } 




8. If there is an assignment 5' of q' into 2?;_i that satisfies some A' & , in 


g' such that 






9. then 2} := V { U { <5'(a) | -.a G jV£, A 5(a) G 2? } 




10. Until 2} = 2Vi 




11. Return 2} 





Figure 1: Procedure used to extend a database. 



We create a decomposition of V with respect to q, q' an d d in a two-step process. We first 
create databases out of the satisfying assignments of q and of q' into T> that retrieve d. Next, we 
extend these databases using the procedure Extend_Database to prevent them from satisfying 
negated atoms that were not satisfied in V. 

Recall that T^(q, T>) is the set of satisfying assignments from q into T> that retrieve d. We denote 
the disjunct of q satisfied by 7 G T^(q,V) as Ay. For each 7 G T^(q,V), we define a database 

£> 7 := {7(a) I a G P 7 }. 

We use this notation since we consider a database to be a set of ground positive relational atoms. 
Note that 2? 7 satisfies the positive atoms in Ay with respect to the assignment 7. However, we 
must extend the databases 2? 7 to ensure that P 7 does not satisfy negated atoms that were not 
satisfied in T>. We now create a database T>* out of 2? 7 using the procedure Extend_Database 
presented in Figure [l]-Q Formally, we define V* := Extend_Database(2? 7 , q, q', V). 

In a similar fashion, we create databases 2?y out of the satisfying assignments 7' G T^(q' ,T>) of 
q' into V that retrieve d. As above, these databases are extended to derive databases V*, using the 
procedure Extend_Database. 

We now define 

A := { V* I 7 G r d -(g, 2?) } U { 2?^ | 7' G I^, 2?) }. (11) 

We present a series of lemmas that will enable us to prove that A is a decomposition of T> w.r.t. 
q, q' and d. We first note that clearly for all T>* G A, it holds that T>* C D. 

3 This process does not necessarily uniquely determine the database D^. However, this is not important for our 
proof. 



18 



The first lemma states that the databases in A have the correct number of constants, i.e., that 
Property Q of decompositions holds for A. 



Lemma 6.1 (Size of Databases) For all databases V* G A, it holds that \carr(V*)\ < r{q,q'). 

Proof. Consider a database T>* G A. Clearly, 2? 7 contains at most r(q) constants. Note that 
when an atom is added during the procedure, the constants appearing in the atom must have 
already appeared in the database £> 7 , or must appear in q or q' . This follows since the queries 
are safe and all variables in negated atoms must also appear in positive atoms (or be equated to 
variables appearing in positive atoms). Thus, V* contains at most r(q,q') constants. 

Similarly, one can show that for V*, G A, it holds that \carr(V*,)\ < r(q,q'). Thus, it easily 
follows that for all T>* G A, we have \carr(T>*)\ < T(q,q'). □ 

We show Property ^ of decompositions for A. 

Lemma 6.2 (Assignments into T> and A) The following relationships hold between the assign- 
ments of q and q' into T> and into databases in A: 

1. r d (q,V) = [j^ eA T d (q,V*); 

2. T (I (q\V) = [j 1) , eA T ci (q',V*). 

Proof. We only prove Part [l|. Part |2| can be shown analogously. We show the set equality in 
Part g by proving two inclusions. 

"C" Suppose that 7 G T^(q,T>). It is enough to show that 7 satisfies q over T>*, which entails 
that 7 Gr d -(g,P;). 

Let a be a positive relational atom in the conjunct Ay. Then 7(a) G 2? 7 by definition. Clearly, 
X> 7 C T>* and thus, 7(a) G V*. If -16 is a negated relational atom in Ay then 7(6) G" T>. Otherwise, 
7 would not be a satisfying assignment of q in P. According to the definition of T>*, it holds that 
T>* C D, and therefore, 7(6) G^ T>*. The satisfaction of C 7 (the comparisons in A 7 ) depends only 
on 7 and not on any database. Thus, 7 is a satisfying assignment of q over D*. 

"D" It suffices to show that for all V* G A it holds that Tj(q,V) 5 T d {q,V*). Suppose that 
7 G Tj(q,T>*). We show that 7 is a satisfying assignment of q over T>. 

Suppose that 7 satisfies the conjunct A 7 of q in T>* . Consider a literal / in Ay. If I is a positive 
relational atom then 7(Z) G T>* . We know that T>* C 2?, thus, *y(l) G T>. Suppose that / is a negated 
relational atom of the form -16, and suppose, by way of contradiction, that 7(6) G T>. Then 7 
satisfies the condition in line 6 of the procedure Extend_Database presented above. Thus, 7(6) 
would have been added to T>* in contradiction to the fact that 7 is a satisfying assignment of q into 
T>* . Finally, note that the satisfaction of comparisons depends only on the assignment, and not on 
the database. Thus, 7 is a satisfying assignment of q over T>. □ 



In Lemma 6.3, Property H of decompositions is proved for A. 



Lemma 6.3 (Assignments into Intersections) The following relationships hold between inter- 
sections of sets of assignments and intersections of sets of databases: 

1. ^ h T^Vl) = T i (q^ h Vl); 

2. f) h r lI (q',vi) = ra(q>,f) h vi) 

for all subfamilies T>^ of A. 
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Proof. We only prove Part [T|. Part |2| can be shown analogously. We show the set equality in 
Part by proving two inclusions. 

"5" Let 7 be an assignment in V ^{q,{^\V* h ). Suppose that 7 satisfies Ay of q in H^h- ^at- 
isfaction of C 7 is dependent only on 7. Let a be a positive atom in Ay. The atom 7(a) appears 
in Pl^h an< ^ thus, 7(a) appears in 2?^ for all h. Thus, 7 satisfies the positive atoms of Ay in each 
of the T>* h . Now, let I be a negated atom in Ay of the form -16. Clearly, 7(6) G" Cl^h- Suppose, 
by way of contradiction, that 7(6) G T>^ for some h. Then 7(6) G T>, since T>* h C D. However, 
it follows that we would have added 7(6) to for all h, since 7 satisfies the condition in line 6 
of Extend_Database. Thus, 7(6) G f]^h m contradiction to the assumption. This proves that 
7 G P| Tj(q, T>^). Hence, we have shown that every 7 that is an element of the right hand side of 
Equation (|l|) is also an element of the left hand side. 

"C" Suppose that 7 G (^T d {q,V* h ). Then 7 G T^q,Vl) for all h. Let Ay be a conjunct 
such that 7 satisfies Ay in for all h. (Recall the definition of T(q,T>) in Section |3.4| , where we 
have assumed that each 7 carries a label, recording which disjunct of q it satisfies.) Once again, 
satisfaction of Gy is dependent only on 7. Consider a positive relational atom a in Ay. Then 
7(a) G T)* h for all h. Thus, 7(a) G fl^h- Similarly, consider a negated atom I of the form —>b in Ay. 
Then 7(6) V* h for all h, and thus, 7(6) f]V* h . This proves that 7 G r d -(g,f|^)- Hence, the 
second inclusion holds as well. □ 

We can now prove our theorem about the existence of decompositions. 

Theorem 6.4 (Existence of Database Decompositions) Let q and q' be a pair of disjunc- 
tive queries, let D be a database, and let d be a tuple of constants from D. Then there exists a 
decomposition ofT> with respect to q, q' and d. 



Proof. From Lemmas 6.1, 3.2 and 6.3 it follows that A as defined in Equation (O) is a 



decomposition of V with respect to q, q' and d as required. □ 
Finally, we reduce equivalence to local equivalence. 



Theorem 6.5 (Reduction to Local Equivalence) Let a be a decomposable aggregation func- 
tion, and let q and q' be disjunctive a-queries. Then q and q' are equivalent if and only if they are 
locally equivalent. 

Proof. We only have to show that local equivalence implies equivalence. Suppose therefore that 
q and q' agree on all databases whose carrier has at most r(g, q') elements. Let V be any database 
and d be a tuple of constants. It suffices to show that 

a(y)ir^q,V)=a(y)lT (I (q',V). 

Let (Pj)k =1 be a decomposition of T> with respect to q, q' and to d. If a is an idempotent 
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monoid function, we apply Proposition |5.1| , which yields 

a(y) I T d (q,V) = a(y) | Uti^, A) 

= Eli («(y)ir d -( g ,A)) 
= Eti («(y)ir d -( g ',A)) 
= «(J7)IUtiiW,A) 

= a(y)iT s (q',V), 



(12a) 
(12b) 
(12c) 
(12d) 
(12e) 



where Equations ( |12a| ) and (12e) hold because of Property 2 of decompositions, Equations ( |12b| ) 
and (12d) hold because of Proposition 5.1 , and Equation (12c) holds because q and q' are locally 
equivalent and the databases T>i contain at most r(q, q') constants. 



If a is a group aggregation function, we apply Proposition 5.2, which yields the equations 
a(y) I F s (q,V) = a(y) j (jjLi T s (q, V,) 



Eti(«(y)ir d -(g,A))- 
Eti(«(y)ir d -(c?,A))- 

E-=i («(y)ir d V,A)) 
Eli (a(y)|r d -( g ',A)) 

a(vHUtlW,D<) 
a(y)|r d -(g',P) 



+ (-l) fc - 1 (a(y)int i r J (g,A)) 

+ (-i) fc - 1 («(y)ir d -(g,nt 1 A)) 
^(-l^K^ir^ntiA)) 

■+(-i)*- i («G/)irtiW,^)) 



(13a) 
(13b) 
(13c) 
(13d) 
(13e) 
(13f) 
(13g) 



where Equations ( |13a| ) and ( |13g| ) hold because of Property 2 of decompositions, Equations ( |13b[ ) 
and (|13^) hold because of Proposition |5.2| , Equations (13c) and (|13e|) hold because of Property 3 of 
decompositions, and Equation ( |13d ) holds because q and q' are locally equivalent and the databases 
T>i contain at most r(q, q') constants. □ 

The aggregation function prod is not a decomposable aggregation function over Q. However, 
prod is decomposable over Q 1 * 1 , i.e., the rational numbers without the element 0. It turns out 
that this is sufficient in order to reduce equivalence to local equivalence for prod, defined over the 
rational numbers. 

Theorem 6.6 (Reduction to Local Equivalence for Product) Suppose q(x,prod(y)) and 
q' (x , prod(y)) are disjunctive prod-queries, defined over Q. Then q and q' are equivalent if and 
only if they are locally equivalent. 

Proof. As before, we only have to show that local equivalence implies equivalence. Suppose 
therefore that q and q' agree on all databases whose carrier has at most T{q,q') elements. Let T> 
be any database and d be a tuple of constants. It suffices to show that 



prod(y) I T S (q,V) = prod(y) [ T 3 (q',V). 



(14) 



21 



Let (£>i)f =1 be a decomposition of T> with respect to q, q' and to d. We distinguish between 
three cases. 

Case 1. Suppose that there is an assignment 7 G rj(g,X>) that maps y to 0. Then, g retrieves 
the aggregate value for d over V, i.e., prod(y) J, rj(g, P) = 0. By Property 2 of decompositions, 
there is a database 2? 7 G (Pi)\ = \ such that 7 G Fj(g, X> 7 ). Note that q returns the aggregate 
value for d over P 7 . By Property 1 of decompositions, P 7 has at most r(q,q') elements. By our 
assumption, q and q' are locally equivalent. Therefore, q' must return the aggregation value for 
d over I? 7 . Hence, by applying Property 2 of decompositions once more, we derive that q' retrieves 
the aggregate value for d over T>. 

Case 2. Suppose that there is an assignment 7 G Tj(q',V) that maps y to 0. By analogous 
arguments to the previous case, we can show that both q and q' retrieve the aggregate value for 
d over T>. 

Case 3. Suppose that there is no assignment in Tg(q, V) that maps y to 0. Similarly, suppose 
that there is no assignment in rj(</,P) that maps y to 0. Then, the aggregation function prod 
could just as well have been defined over Q ± . In this case, prod is a decomposable aggregation 



function and the arguments used in Equations ( |13a| ) through (|13gl ) in the proof of Theorem 6J3 
apply. Therefore, q and q' return the same aggregate value for d over T> as required. 

Thus, we have proved that Equation (|i~4| ) holds in all possible cases. □ 

The following result follows directly from Theorem |6.5|. 



Corollary 6.7 (Local Equivalence and Equivalence) Suppose a is a decomposable aggrega- 
tion function. If local equivalence is decidable for disjunctive a- queries, then equivalence is also 
decidable. 

In Section ^, we have noted that bounded equivalence of a-queries can be checked in double 
exponential time if ordered identities of the form defined in Formula (||) can be decided in double 
exponential time. Hence, if a is also decomposable, we derive a double exponential upper bound 
on checking for equivalence of a-queries. 



From Theorems p.5| and p.6| and Corollary 4.10 we derive the following result 



Corollary 6.8 (Decidable Query Classes) Equivalence of disjunctive aggregate queries is de- 
cidable for the aggregation functions max, top2, count, parity, and sum over both the integers and 
the rational numbers. In addition, equivalence of disjunctive prod-queries is decidable over the 
rational numbers. 



7 Equivalence of Conjunctive Quasilinear Queries 



A positive conjunctive query q is linear if no predicate occurs more than once in q [13]. We 
generalize this by defining that a conjunctive query is quasilinear if no predicate that occurs in a 
positive literal, occurs more than once. Thus, in a quasilinear query, no predicate occurs in both a 
positive and a negated literal and no predicate occurs more than once in a positive literal. In this 
section we show that for a wide range of quasilinear queries, equivalence is isomorphism. 



In Section 4.2. we defined reduced sets of terms with respect to a complete ordering. In a similar 
spirit, we now introduce reduced conjunctions of comparisons. A conjunction of comparisons C is 
reduced with respect to a domain I if 

• there are no variables x and y occurring in C such that C \=j x = y; 
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• there is no variable x occurring in C such that C |=j x = d for a constant dsl. 

We say that a conjunctive query is reduced with respect to I if its comparisons are reduced with 
respect to X. If the domain is clear from the context, we will simply say that a query is reduced, 
without specifying the domain. 

We have shown in that for any positive conjunctive query, one can compute in polynomial 
time an equivalent reduced conjunctive query. This still holds when the query contains negated 
atoms. Note that the head of the equivalent reduced query may contain constants, even if the head 
of the original non-reduced query does not. 

Let q(s, a(t)) <— P A N A C and q'(s', a(F)) <— P' A N' A C be conjunctive aggregate queries 
with comparisons, ranging over the domain I. We use P and P' to denote the positive atoms, N 
and N' to denote the negated atoms, and C and C to denote the comparisons. A homomorphism 
from q' to q is a substitution 9 of the variables in q' by terms in q such that 

1. 6(s') = s and 9(t') = t; 

2. 9(a') is in P for every positive relational atom a' of P'\ 

3. 9{a') is in N for every negated relational atom a' of N'\ 

4. C |=i 9{s') p 9f) for every comparison s' p t' in C. 

A homomorphism is an isomorphism if it is bijective and if its inverse is also a homomorphism. The 
queries q' and q are isomorphic if there is an isomorphism from q' to q. In [[0| we have also shown 
that reduced linear max, count and sum queries are equivalent if and only if they are isomorphic. 
For queries with negated literals, we can generalize this result to quasilinear queries. 

We say that a class of queries Q is proper if for satisfiable queries in Q equivalence implies 
isomorphism, that is, if for any two satisfiable reduced queries q, q' £ Q it is the case that q and q' 
are only equivalent if they are isomorphic. For every aggregation function we denote by £(ot) the 
class of linear a-queries. Similarly, we denote by QC(a) the class of quasilinear a-queries. 

Theorem 7.1 (Quasilinear and Linear Queries) Let a be an aggregation function. Suppose 
that the class C{a) is proper. Then, the class Q£(a) is also proper. 

Proof. Consider the satisfiable reduced queries 

q(s,a(t)) «- P AN AC 
q'{s,a{t)) ^P' AN' AC' 

where 

• P and P' are conjunctions of positive relational atoms; 

• iV and N' are conjunctions of negated relational atoms; 

• C and C are conjunctions of comparisons. 

Suppose that C{a) is proper. Suppose that q is not isomorphic to q' . We show that q is not 
equivalent to q' . Note that we can assume that q and q' have the same heads since otherwise they 
are obviously not equivalent. 
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We introduce the positive parts of q and q' as the queries q + and q',, defined as 

q+(s,a(t)) <- P A C 
q' + (s,a{t)) ^P' AC". 

We consider two cases. 

Case 1. Suppose that q + is not isomorphic to q' + . Hence, q + and q' + are not equivalent. Let 
P be a database for which q + and return different values. We may assume, without loss of 
generality, that T> only contains atoms with predicates appearing in P or in P'. 

If there is a predicate p that appears in an atom P, but not in P' , then clearly q and q' cannot 
be equivalent, since we could create a database that satisfies q' and does not contain any atom with 
predicate p. Similarly, if there is a predicate that appears in P', but not in P, then q and q' cannot 
be equivalent. Hence, we may assume that the set of predicates of atoms in P is identical to the 
set of predicates of atoms in P'. Thus, there is no atom in T> containing a predicate appearing in 
N or N'. 

Thus, q + v = q v and q' + V = q' V . We conclude that V is a counterexample for the equivalence 
of q and q' . 

Case 2. Suppose that q + is isomorphic to q' + . Since q + and q' + are linear, there exists only 
one isomorphism between them, say 9. Note that 8 is defined on all the variables in q, since q is 
a safe query. By our assumption, q is not isomorphic to q' . Thus, 6(N) ^ N'. Suppose, without 
loss of generality, that -ia appears in N and 6{-^a) does not appear in N'. Let 7 be a mapping of 
the variables in q to constants, such that 7 is consistent with the comparisons in q. We define a 
database V := { 7(6) | b £ P } U { 7(a) }. Clearly, q does not return a grouping value for 7(5) over 
T> whereas q' does return a grouping value for j(s). Thus, q and q' are not equivalent. 

This completes the proof. □ 



Now, it follows from our results in [13| that for quasilinear queries with the aggregate functions 
max, sum and count, equivalence boils down to isomorphism. In a similar fashion to the proofs 
there, we can extend our results to additional aggregate functions. 

A bag B is a singleton if it contains exactly one value. We say that an aggregation function a 
is a singleton- determining aggregation function, if for all singleton bags B and B' we have that 

a(B) = a(B') <=^ B = B' . 

Clearly max, top2, sum, prod and avg are singleton-determining aggregation functions. Note 
that count and parity are miliary aggregate functions. Thus, they are defined over a domain that 
contains only a single value, the empty tuple. Hence, count and parity are also singleton-determining 
aggregation functions. However cntd is not singleton-determining aggregation functions. 

Theorem 7.2 (Equivalence of Quasilinear Queries) Let a be an aggregation function. Then, 
the following conditions are equivalent: 

1. a is singleton- determining; 

2. C(a) is proper; 

3. QC(a) is proper. 



Proof. The direction "(H) (g)" holds by Theorem 71. Clearly, "(g) ©" holds since 



C(a) C QC{a). Thus, we need only show that "(|) (|)" and "(|) => (|)". 
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"(H) => @" Suppose that a is a singleton-determining aggregation function. We show that 
C{a) is proper. To this end, let q(s, a(i)) <— A and q'(s,a(t)) <— A' be satisfiable reduced linear 
a-queries. Suppose that q = q' . We will show that q and q' are isomorphic. 

In H it has been shown that positive linear non-aggregate queries without comparisons are set- 
equivalent if and only if they are isomorphic. This still holds even if the queries have comparisons.^ 
We associate with q a non-aggregate query q, called the non-aggregate projection of q, which is 
derived from q by simply removing the aggregate term from the head of q. Thus, q has the form 

q(s) - A. 

Since q = q', they return values for the same grouping tuples. Thus, q is set-equivalent to q'. 
Hence, q is isomorphic to q' . Let 9 be the isomorphism from q' to q. If a is a miliary aggregation 
function, then 9 is an isomorphism from q' to q. Suppose that a is not a miliary aggregation 
function. 

Let 7 be an instantiation of the terms in q that satisfies the comparisons in q and maps each 
term to a different value. We construct a database T> out of q by applying 7 to the relational part 
of q. 

Clearly, the only satisfying assignment of q to the constants in T> is exactly 7. Thus, q retrieves 
(7(5), a(7(t))). The only satisfying assignment of q' is 70$. Therefore, q' returns {^o9{s),a{^o9{i))). 
Note that since 9 is an isomorphism from q' to q, it holds that 7 o 9(s) = 7(s). 

Recall that a is a singleton-determining aggregation function. Therefore, a{^o9(t))) = a(-y(t))) 
if and only if 7 o 9(t) = 7(t). The instantiation 7 is an injection, thus 7 o 9(t) = 7(f) if and only if 
9(t) = t. This must hold since q = q'. Therefore, 9 is an isomorphism from q to q' . 

"(H) =^ ([[])" Suppose that a is not a singleton-determining aggregation function. We show that 
£(a) is not proper. To this end, we create linear a-queries q and q' such that q = q', but q and q' 
are not isomorphic. 

Since a is not a singleton-determining aggregation function, there are singleton bags B = j[ d]J-, 
and B' = § d' § such that d 7^ d' and a(B) = a(B'). We define the queries 

q(a(d))^p(d)Ap(d') 
q'{a{d!)) ^p(d)Ap(d'). 

Clearly q and q' are not isomorphic, but they are equivalent. □ 



Corollary 7.3 (Equivalence and Isomorphism) The classes of quasilinear max, top2, count, 
sum, prod, parity and avg queries are proper. 

Proof. This result follows from the fact that all the aggregation functions above are singleton- 



determining and from Theorem 7.2. □ 



For cntd a similar result can be shown for common cases. 

Theorem 7.4 (Equivalence of Quasilinear Count-Distinct Queries) Let q and q' be satis- 
fiable reduced quasilinear cntd- queries. Moreover, suppose that 

• the comparisons in q and q' use only <, > and 

• q and q' either range over the rational numbers or do not have constants. 



4 We are not aware that this result has been published, but it appears in the extended version of [psf. 
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Table 1: Properties of aggregation functions 



Then q and q' are equivalent if and only if they are isomorphic. 

Proof. This follows directly from the fact that such queries, when positive, are equivalent if and 



only if they are isomorphic |13[ and from Theorem 7.1. □ 



Since isomorphism of quasilinear queries can be checked in polynomial time, we derive the 
following complexity result. 

Corollary 7.5 (Polynomiality) The equivalence problem for the class of quasilinear a-queries is 
decidable in polynomial time if a is one of the aggregation functions max, top2, count, sum, prod, 
parity, or avg and for common cntd-queries. 



8 Conclusion 

Necessary and complete conditions for the decidability of bounded equivalence of disjunctive ag- 
gregate queries with negation have been presented. This problem has been shown to be decidable 
for a wide class of aggregation functions. Equivalence of aggregate queries with negation has been 
reduced to a special case of bounded equivalence, called local equivalence, for decomposable aggre- 
gation functions. We have also shown that equivalence can be decided in polynomial time for the 
common case of quasilinear queries. 

Novel proof techniques have been presented. One example is the application of the Principle 
of Inclusion and Exclusion to the case of group aggregation functions. Our results are couched 
in terms of abstract characterizations of aggregation functions. Thus, the results presented are 
easily extendible to additional aggregation functions. In Table [l] we summarize the properties that 
hold for each of the aggregation functions considered in this paper. Table [2] shows our decidability 
results for these aggregation functions. 

Bag-set semantics has been introduced in |J to give a formal account of the way in which SQL 
queries are executed, which do not return a set of tuples but a multiset. It is easy to see that two 
non-aggregate queries are equivalent under bag-set semantics if and only if the aggregate queries 
obtained by adding the function count are equivalent. Thus, our results on count-queries directly 
carry over to non-aggregate queries that are evaluated under bag-set semantics. This is a significant 
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Table 2: Properties of classes of queries 

contribution to the understanding of SQL queries. Moreover, these results can easily be extended 
to non-aggregate queries evaluated under bag semantics || §], thereby, solving an additional open 
problem. 

Concepts seemingly similar to the ones introduced in the present paper have been investigated 
in Q . In particular, the authors considered aggregation functions defined in terms of commutative 
monoids. However, the purpose of that research was to study the expressivity of logics that extend 
first-order logic by aggregation. In @ it is shown that formulas in those extended logics are Hanf- 
local and Gaifman-local. Intuitively, this means that whether or not a formula is true for a tuple d 
in a structure, depends only on that part of the structure that is "close" to d. A class of formulas 
that is Hanf- or Gaifman-local need not be decidable. In addition, the authors only considered 
monoids over the rational numbers, which excludes functions such as topK and parity. 

We leave for future research the problem of deciding equivalence among avg and cntd queries 
as well as equivalence of aggregate queries with a HAVING clause. Finding tight upper and lower 
bounds for equivalence, as well as the adaptation of our results to the view usability problem are 
other important open problems. 
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